Improving Information Retrieval System by Co-clustering Web Documents and Queries

نویسندگان

LIU Yu-feng

LI Ren-fa

چکیده

World Wide Web is considered the most valuable place for Information Retrieval and Knowledge Discovery. While retrieving information through user queries, a search engine results in a large and unmanageable collection of documents. A more efficient way to organize the documents can be a combination of clustering and ranking, where clustering can group the documents and ranking can be applied for ordering the pages within each cluster. This paper proposes an approach to co-clustering web documents and queries. When user issues a query, we construct a Query-Document Bipartite Graph from click log data. Then, we co-cluster the web documents and queries simultaneous based on the bipartite spectral graph partitioning which uses the second singular vectors of an appropriately scaled query-document matrix to yield good bipartition and rank the queries and documents on the bipartite graph via an iterative process like HITS. The results of experiments show promising improvement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Background and Aim: this research investigates the impact of authors’ rank in Bibliographic networks on document-centered model of Expertise Retrieval. Its purpose is to find out what kind of authors’ ranking in bibliographic networks can improve the performance of document-centered model. Methodology: Current research is an experimental one. To operationalize research goals, a new test colle...

متن کامل

Improving the presentation of search results by multipartite graph clustering of multiple reformulated queries and a novel document representation ∗

The goal of clustering web search results is to reveal the semantics of the retrieved documents. The main challenge is to make clustering partition relevant to a user’s query. In this paper, we describe a method of clustering search results using a similarity measure between documents retrieved by multiple reformulated queries. The method produces clusters of documents that are most relevant to...

متن کامل

Document Clustering Using Semantic Cliques Aggregation

The search engines are indispensable tools to find information amidst massive web pages and documents. A good search engine needs to retrieve information not only in a shorter time, but also relevant to the users’ queries. Most search engines provide short time retrieval to user queries; however, they provide a little guarantee of precision even to the highly detailed users’ queries. In such ca...

متن کامل

Public Transport Ontology for Passenger Information Retrieval

Passenger information aims at improving the user-friendliness of public transport systems while influencing passenger route choices to satisfy transit user’s travel requirements. The integration of transit information from multiple agencies is a major challenge in implementation of multi-modal passenger information systems. The problem of information sharing is further compounded by the multi-l...

متن کامل

THUIR at TREC 2008: Relevance Feedback Track

Tsinghua University Information Retrieval Group (THUIR) has participated into the first Relevance Feedback Track of TREC2008. The TMiner search engine has been used as our text retrieval system, because the processing capability and flexibility of this system on large text data has been testified during many years’ Web Track and Terabyte Track. In the track, we studied two approaches: 1) query ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Improving Information Retrieval System by Co-clustering Web Documents and Queries

نویسندگان

چکیده

منابع مشابه

Investigating the Impact of Authors’ Rank in Bibliographic Networks on Expertise Retrieval

Improving the presentation of search results by multipartite graph clustering of multiple reformulated queries and a novel document representation ∗

Document Clustering Using Semantic Cliques Aggregation

Public Transport Ontology for Passenger Information Retrieval

THUIR at TREC 2008: Relevance Feedback Track

عنوان ژورنال:

اشتراک گذاری